MMDS 2008: Algorithmic and Statistical Challenges in Modern Large-Scale Data Analysis, Part II

نویسنده

  • Gunnar E. Carlsson
چکیده

Algorithmic Approaches to Networked Data In an algorithmic perspective on improved models for data, Milena Mihail of the Georgia Institute of Technology began by describing the recent development of a rich theory of power-law random graphs, i.e., graphs that are random conditioned on a specified input power-law degree distribution. With the increasingly wide range of large-scale social and information networks now available, however, generative models that are structurally or syntactically more flexible have become necessary. Mihail described two such extensions: one in which semantics on nodes is modeled by a feature vector, with edges added between nodes based on their semantic proximity, and another in which the phenomenon of associativity/disassociativity is modeled by fixing the probability that nodes of a given degree di tend to link to nodes of degree dj. A small extension in the parameters of a generative model, of course, can lead to a large increase in the observed properties of generated graphs. This observation raises interesting statistical questions about model overfitting, and argues for more refined and systematic methods for model parameterization. It also leads to some of the new algorithmic questions that were the topic of Mihail’s talk. Mihail posed the following algorithmic problem for the basic power-law random graph model: Given as input an N-vector specifying a degree sequence, determine whether a graph with that degree sequence exists and, if it does, efficiently generate one (perhaps approximately uniformly randomly from the ensemble of such graphs). Such realizability problems have a long history in graph theory and theoretical computer science. Because their solutions are intimately related to the theory of graph matchings, many generalizations of the basic problem can be addressed in a strict theoretical framework. For example, motivated by associative/disassociative networks, Mihail described recent progress on the jointdegree matrix realization problem: Given a partition of the node set into classes of vertices of the same degree, a vector specifying the degree of each class, and a matrix specifying the number of edges between any two classes, determine whether such a graph exists and, if it does, construct one. She also described extensions of this basic problem to connected graphs, to finding minimum cost realizations, and to finding a random graph satisfying those basic constraints.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MMDS 2008 : Algorithmic and Statistical Challenges in Modern Large - Scale Data Analysis , Part I

The 2008 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2008), held at Stanford University, June 25–28, had two goals: first, to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly structured scientific and Internet data sets, and second, to bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to...

متن کامل

Algorithmic and Statistical Challenges in Modern Large-Scale Data Analysis

We provide a report for the ACM SIGKDD community about the 2008 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2008), its origin in MMDS 2006, and future directions for this interdisciplinary research area.

متن کامل

MMDS 2014: Workshop on Algorithms for Modern Massive Data Sets

The 2014 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2014) will address algorithmic and statistical challenges in modern large-scale data analysis. The goals of MMDS 2014 are to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and to bring together computer scientists, statisticians, mathem...

متن کامل

MMDS 2008: Algorithmic and Statistical Challenges in Mod- ern Large-Scale Data Analysis are the Focus

The 2008 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2008), sponsored by the NSF, DARPA, LinkedIn, and Yahoo!, was held last year at Stanford University, June 25–28, 2008. The goals of MMDS 2008 were (1) to explore novel techniques for modeling and analyzing massive, high-dimensional, and nonlinearly-structured scientific and internet data sets; and (2) to bring together computer ...

متن کامل

Computation in Large-Scale Scientific and Internet

A report is provided for the ACM SIGKDD community about the 2010 Workshop on Algorithms for Modern Massive Data Sets (MMDS 2010), its origin in MMDS 2006 and MMDS 2008, and future directions for this interdisciplinary research area.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009